Density-Distance Outlier Detection Algorithm Based on Natural Neighborhood

نویسندگان

چکیده

Outlier detection is of great significance in the domain data mining. Its task to find those target points that are not identical most object generation mechanisms. The existing algorithms mainly divided into density-based and distance-based algorithms. However, both approaches have some drawbacks. former struggles handle low-density modes, while latter cannot detect local outliers. Moreover, outlier algorithm very sensitive parameter settings. This paper proposes a new two-parameter (TPOD) algorithm. method proposed this does need manually define number neighbors, introduction relative distance can also solve problem low density further accurately combinatorial optimization problem. Firstly, natural neighbors iteratively calculated, then calculated by adaptive kernel estimation. Secondly, computed through neighbors. Finally, these two parameters combined obtain factor. eliminates influence require users determine outliers themselves, namely, top-n effect. Two synthetic datasets 17 real were used test effectiveness method; comparison with another five provided. AUC value F1 score on multiple higher than other algorithms, indicating be found accurately, which proves effective.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection Based On Neighborhood Proximity

Outliers, also called anomalies are data patterns that do not conform to the behavior that is expected or differ too much from the rest. In some cases, outliers could be caused by errors in data generating/collecting methods or by inherent data variability. However, in many situations, outliers are indications of interesting events that have never been known before and hence, an adaptation of t...

متن کامل

An outlier detection algorithm based on clustering

Outlier detection is a very important type of data mining, which is extensively used in application areas. The traditional cell-based outlier detection algorithm not only takes a large amount of time in processing massive data, but also uses lots of machine resources, which results in the imbalance of the machine load. This paper presents an distancebased outlier detection algorithm. These expe...

متن کامل

A Study on Distance-based Outlier Detection on Uncertain Data

Uncertain data management, querying and mining have become important because the majority of real world data is accompanied with uncertainty these days. Uncertainty in data is often caused by the deficiency in underlying data collecting equipments or sometimes manually introduced to preserve data privacy. The uncertainty information in the data is useful and can be used to improve the quality o...

متن کامل

Rapid Distance-Based Outlier Detection via Sampling

Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling...

متن کامل

Distance-based Outlier Detection in Data Streams

Continuous outlier detection in data streams has important applications in fraud detection, network security, and public health. The arrival and departure of data objects in a streaming manner impose new challenges for outlier detection algorithms, especially in time and space efficiency. In the past decade, several studies have been performed to address the problem of distance-based outlier de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Axioms

سال: 2023

ISSN: ['2075-1680']

DOI: https://doi.org/10.3390/axioms12050425